11. Char-RNN, Solution
07 CharRNN Solution V1
Representing Memory
You’ve learned that RNN’s work well for sequences of data because they have a kind of memory. This memory is represented by something called the hidden state .
In the character-level LSTM example, each LSTM cell, in addition to accepting a character as input and generating an output character, also has some hidden state, and each cell will pass along its hidden state to the next cell.
This connection creates a kind of memory by which a series of cells can remember which characters they’ve just seen and use that information to inform the next prediction!
For example, if a cell has just generated the character
a
it likely will
not
generate another
a
, right after that!
net.eval()
There is an omission in the above code: including
net.eval()
!
net.eval(
) will set all the layers in your model to evaluation mode. This affects layers like dropout layers that turn "off" nodes during training with some probability, but should allow every node to be "on" for evaluation. So, you should set your model to evaluation mode
before testing or validating your model
, and before, for example, sampling and making predictions about the likely next character in a given sequence. I'll set net.train()` (training mode) only during the training loop.
This is reflected in the previous notebook code and in our Github repository .